The problem

After having gained some intuition about simulated annealing, let us now try to apply our knowledge to tree searches [9]. Given a game position, we are looking for the best move. Tree searches are similar to combinatorial optimization problems in that an extremely large number of alternatives has to be considered for an exhaustive search.

Let us first focus on aspects of tree searches which lead to the failure of some natural suggestions for the implementation of simulated annealing before we argue why the suggestion given in the introduction could work in principle. Faced with a game tree which is far to large for complete exploration, some sort of pruning becomes essential. There are different algorithmical techniques which allow one to ignore parts of the game tree, for example, α-β pruning. In fact, the evaluation function which is invoked at the deepest level of the tree to find the value of a position is also some sort of pruning. The outcome of the remainder of the tree is estimated by applying game heuristics. For example, certain patterns are good because they have been proven to be so in previous games, and to increase one's influence on the board is advantageous because experience shows that influence helps in the long run (down the tree). A first proposal could therefore be to apply simulated annealing to pruning. At each branching, evaluate the move possibilities from go knowledge but pursue moves only with a certain probability. This is actually an old idea, and I do not believe that the exponential probabilities from simulated annealing would give anything useful. The main problem is that the game tree of go is still far too large even if only two moves are considered at each level. Go knowledge remains essential.

Getting back to combinatorial optimization where simulated annealing succeeded in optimizing long lists of objects, to find the best move really means to find the best move considering optimal play of both players, i.e. we want to optimize a sequence of moves. Maybe we can consider entire games as the object of optimization? This would make the evaluation trivial, since at the end of a game counting gives a precise value.

Game trees are, however, profoundly different from the traveling salesman problem because there are two competing parties playing the game (trees are not paths). This fact makes it impossible to find local moves in the configuration space of the type used in the traveling salesman problem. We are really dealing with the optimization of two sequences of moves, one for Black, one for White. The problem is that any local change in the order of moves that Black makes has a non-local effect on the game tree. For any move that Black chooses at a given point in the game over another, the optimal play for both players will in general have changed completely for the remainder of the game. Think of the value of a move for Black as a function of the order in which Black intends to play his moves. For a numbered list of such orderings, the value for ordering 1, 2,… is likely to look like a sequence of random numbers because of the different, optimal responses by White. It seems that simulated annealing cannot be applied in this case since there are no local moves and under changes in the list of moves the value function behaves so discontinuously.